Skip to content

facex-engine/facex

Repository files navigation

FaceX

Face verification that runs entirely in the browser. Or on your server at 3ms. No cloud needed.

License: Apache 2.0 LFW Latency Platform WASM Deps

Add face recognition to any app in minutes. Runs in the browser (74 KB WebAssembly) or on your server (3ms native C). Detects faces, aligns them, computes embeddings, compares. No server required for browser mode — photos never leave the user's device.

<!-- Browser: face verification in 3 lines -->
<script src="facex-sdk.js"></script>
<script>
  const fx = new FaceXSDK();
  await fx.load();
  const result = fx.verify(videoElement, referenceEmbedding);
  // { match: true, similarity: 0.87, ms: 17 }
</script>
// Native C: 3ms per face
#include "facex.h"
FaceX* fx = facex_init("weights.bin", NULL);
float emb[512];
facex_embed(fx, face_112x112, emb);
float sim = facex_similarity(emb1, emb2);

What can you build with this?

  • Identity verification — "is this the same person?" from selfie + ID photo
  • Face login — unlock apps by face, works offline, no data leaves the device
  • Access control — doors, gates, turnstiles on edge hardware without GPU
  • Proctoring — verify exam takers are who they claim to be
  • Smart cameras — recognize known faces at 300+ faces/sec on a single CPU core

How it works

FaceX detects faces, aligns them using 5 landmarks, and computes a 512-dim embedding. Compare two embeddings — above 0.3 similarity = same person. 99.73% accuracy on the LFW benchmark.

Two modes:

  • Browser: 74 KB WebAssembly, 17ms pipeline, no server needed
  • Native: 148 KB C library, 3ms per face, faster than ONNX Runtime

Six months of optimization: handwritten AVX2/AVX-512 SIMD kernels, INT8 GEMM, cache-tuned layout — every millisecond fought for.


Benchmarks

Measured on Intel i5-11500 (6 cores, AVX-512 + VNNI):

Speed

Speed comparison

Engine Median Min vs FaceX
FaceX 3.0 ms 2.87 ms --
ONNX Runtime 1.23 3.9 ms 3.18 ms 1.30x slower
InsightFace (R34) 17 ms -- 5.7x slower
FaceNet (PyTorch) 30 ms -- 10x slower
dlib 50+ ms -- 17x slower

Accuracy

Benchmark Score
LFW verification 99.73%
Model parameters 1.77M
Embedding dim 512

Footprint

Footprint comparison

Metric FaceX ONNX Runtime
Library size 148 KB 28 MB
Total deploy 7 MB 157 MB
Dependencies none Python + onnxruntime
Cold start ~100 ms ~350 ms

Quick start

C

#include "facex.h"

int main() {
    // Load engine (one-time, ~100ms)
    FaceX* fx = facex_init("edgeface_xs_fp32.bin", NULL);

    // Compute embedding (3ms per call)
    float face[112 * 112 * 3];  // RGB, HWC, [-1, 1]
    float embedding[512];
    facex_embed(fx, face, embedding);

    // Compare two faces
    float sim = facex_similarity(emb_a, emb_b);
    // sim > 0.3 → same person

    facex_free(fx);
}
gcc -O3 -march=native -Iinclude -o myapp myapp.c -L. -lfacex -lm -lpthread

Go

import "github.com/facex-engine/facex/go/facex"

ff, _ := facex.New(facex.Config{
    Exe:     "./facex-cli",
    Weights: "./edgeface_xs_fp32.bin",
})
defer ff.Close()

embedding, _ := ff.Embed(rgbImage)
sim := facex.CosSim(embA, embB)

CLI (any language via stdin/stdout)

# Pipe mode: reads 112x112x3 float32 HWC, writes 512 float32
./facex-cli weights.bin --server < faces.raw > embeddings.raw

Browser (WebAssembly)

<script src="facex.js"></script>
<script>
const Module = await FaceXModule();
const fx = Module.cwrap('facex_init', 'number', ['string', 'string'])('/weights.bin', null);
// 7ms per face, runs entirely in browser, no server needed
</script>

48 KB WASM module. Face recognition with zero server infrastructure. See wasm/ for the full browser demo with live camera.


Build

make            # builds libfacex.a + facex-cli
make example    # builds and runs example
make encrypt    # builds weight encryption tool

Requirements: GCC with AVX2 support. Nothing else.

Cross-compile for Linux (from WSL)

gcc -O3 -march=x86-64-v3 -mavx2 -mfma -static \
    -DFACEX_LIB -o libfacex.a src/*.c -lm -lpthread

API

// Initialize engine. Returns NULL on error.
// license_key: NULL for plain weights, or key string for AES-256 encrypted.
FaceX* facex_init(const char* weights_path, const char* license_key);

// Compute 512-dim face embedding from 112x112 RGB image.
// rgb_hwc: float32 array [112][112][3], values in [-1, 1].
// embedding: output buffer, 512 floats (L2-normalized).
int facex_embed(FaceX* fx, const float* rgb_hwc, float embedding[512]);

// Cosine similarity between two embeddings. Range [-1, 1].
float facex_similarity(const float emb1[512], const float emb2[512]);

// Free engine resources.
void facex_free(FaceX* fx);

// Version string.
const char* facex_version(void);

Architecture

Input: 112x112 RGB float32
    ↓
  Stem: Conv 3→32, stride 4
    ↓
  Stage 0: 3× ConvNeXt blocks (C=32)
    ↓
  Stage 1: 2× ConvNeXt + XCA attention (C=64)
    ↓
  Stage 2: 8× ConvNeXt + XCA attention (C=100)
    ↓
  Stage 3: 2× ConvNeXt + XCA attention (C=192)
    ↓
  Global Average Pool → LayerNorm → FC → L2 Norm
    ↓
Output: 512-dim embedding

Engine internals:

  • Pure C99 + SIMD intrinsics (AVX2, FMA, AVX-512, VNNI)
  • INT8 quantized GEMM with vpmaddubsw (AVX2) / vpdpbusd (VNNI)
  • FP32 packed column-panel MatMul (NR=8 AVX2, NR=16 AVX-512)
  • Custom thread pool with work-stealing (WaitOnAddress / futex)
  • Exact GELU via polynomial erf approximation (A&S 7.1.26)
  • Pre-packed weights at load time for cache-optimal access
  • Optional AES-256-CTR weight encryption with hardware binding

Weight encryption

For commercial deployment with IP protection:

# Encrypt weights (binds to target machine hardware)
./facex-encrypt encrypt weights.bin weights.enc "LICENSE-KEY"

# Load encrypted weights
FaceX* fx = facex_init("weights.enc", "LICENSE-KEY");

Wrong key or different machine → load fails. Original weights never touch disk in plaintext on the target machine.


Integration paths

Language Method Latency
C / C++ libfacex.a + facex.h 3 ms (native)
Browser facex.wasm (48 KB) 7 ms (WASM SIMD)
Go go/facex subprocess ~4 ms
Python subprocess / ctypes ~4 ms
Any facex-cli --server stdin/stdout ~4 ms

Limitations

  • x86-64 only. AVX2 required, AVX-512 optional. ARM NEON port planned for Q3 2026.
  • Embedding only. Face detection and alignment are separate steps.
  • Single model. EdgeFace-XS (1.77M params). Other models need weight conversion.

Model

Uses EdgeFace-XS by George et al.:

  • 1.77M parameters (smallest in its accuracy class)
  • 99.73% LFW, competitive with models 100x larger
  • Originally CC BY-NC-SA 4.0 license

Repo layout

include/
  facex.h               — public API (5 functions)
  weight_crypto.h       — encryption API
src/
  facex.c               — API implementation
  edgeface_engine.c     — forward pass (all stages + ops)
  transformer_ops.c     — SIMD kernels (LN, GELU, MatMul, Conv)
  gemm_int8_4x8c8.c    — INT8 GEMM microkernel (AVX2 + VNNI)
  threadpool.c/h        — lock-free thread pool
  weight_crypto.c       — AES-256-CTR encryption
go/facex/               — Go binding (subprocess protocol)
examples/
  example.c             — minimal usage example
docs/                   — SVG benchmarks, logo

FAQ

Q: Is it really faster than ONNX Runtime? A: Yes. Measured on the same CPU, same model, same input. FaceX median 3.0 ms vs ONNX Runtime median 3.9 ms. The gap comes from handwritten SIMD kernels that avoid framework overhead.

Q: What accuracy vs ArcFace-R100? A: EdgeFace-XS gets 99.73% LFW vs ArcFace-R100's 99.80%. The 0.07% gap buys you 10x speed and 60x smaller model.

Q: Can I use this commercially? A: The engine code is Apache 2.0 -- fully commercial. The bundled model weights are CC BY-NC-SA 4.0 (non-commercial). For commercial use, train your own weights or contact for licensing.

Q: Does it do face detection? A: No. FaceX is the embedding step only. Pair it with any face detector (RetinaFace, SCRFD, YuNet, etc.) for a complete pipeline.


Citation

@software{facex2026,
  author  = {Atinov, Baurzhan},
  title   = {FaceX: Fast CPU Face Embedding Library},
  year    = {2026},
  url     = {https://github.com/facex-engine/facex}
}

License

Code: Apache License 2.0 -- free for commercial use. Model weights: CC BY-NC-SA 4.0 (follows upstream EdgeFace license). Train your own weights for unrestricted commercial use.

For commercial licensing: bauratynov@gmail.com


Created by Baurzhan Atinov (Kazakhstan)
GitHub